|

1.

Microbiome composition and function within the Kellet's whelk perivitelline fluid.

Daniels, Benjamin N; Nurge, Jenna; De Smet, Chanel; Sleeper, Olivia; White, Crow; Davidson, Jean M; Fidopiastis, Pat.

Microbiol Spectr ; : e0351423, 2024 Feb 09.

Article En | MEDLINE | ID: mdl-38334378

Microbiomes have gained significant attention in ecological research, owing to their diverse interactions and essential roles within different organismal ecosystems. Microorganisms, such as bacteria, archaea, and viruses, have profound impact on host health, influencing digestion, metabolism, immune function, tissue development, and behavior. This study investigates the microbiome diversity and function of Kellet's whelk (Kelletia kelletii) perivitelline fluid (PVF), which sustains thousands of developing K. kelletii embryos within a polysaccharide and protein matrix. Our core microbiome analysis reveals a diverse range of bacteria, with the Roseobacter genus being the most abundant. Additionally, genes related to host-microbe interactions, symbiosis, and quorum sensing were detected, indicating a potential symbiotic relationship between the microbiome and Kellet's whelk embryos. Furthermore, the microbiome exhibits gene expression related to antibiotic biosynthesis, suggesting a defensive role against pathogenic bacteria and potential discovery of novel antibiotics. Overall, this study sheds light on the microbiome's role in Kellet's whelk development, emphasizing the significance of host-microbe interactions in vulnerable life history stages. To our knowledge, ours is the first study to use 16S sequencing coupled with RNA sequencing (RNA-seq) to profile the microbiome of an invertebrate PVF.IMPORTANCEThis study provides novel insight to an encapsulated system with strong evidence of symbiosis between the microbial inhabitants and developing host embryos. The Kellet's whelk perivitelline fluid (PVF) contains microbial organisms of interest that may be providing symbiotic functions and potential antimicrobial properties during this vulnerable life history stage. This study, the first to utilize a comprehensive approach to investigating Kellet's whelk PVF microbiome, couples 16S rRNA gene long-read sequencing with RNA-seq. This research contributes to and expands our knowledge on the roles of beneficial host-associated microbes.

2.

Genomic DNA extraction optimization and validation for genome sequencing using the marine gastropod Kellet's whelk.

Daniels, Benjamin N; Nurge, Jenna; Sleeper, Olivia; Lee, Andy; López, Cataixa; Christie, Mark R; Toonen, Robert J; White, Crow; Davidson, Jean M.

PeerJ ; 11: e16510, 2023.

Article En | MEDLINE | ID: mdl-38077446

Next-generation sequencing technologies, such as Nanopore MinION, Illumina Hiseq and Novaseq, and PacBio Sequel II, hold immense potential for advancing genomic research on non-model organisms, including the vast majority of marine species. However, application of these technologies to marine invertebrate species is often impeded by challenges in extracting and purifying their genomic DNA due to high polysaccharide content and other secondary metabolites. In this study, we help resolve this issue by developing and testing DNA extraction protocols for Kellet's whelk (Kelletia kelletii), a subtidal gastropod with ecological and commercial importance, by comparing four DNA extraction methods commonly used in marine invertebrate studies. In our comparison of extraction methods, the Salting Out protocol was the least expensive, produced the highest DNA yields, produced consistent high DNA quality, and had low toxicity. We validated the protocol using an independent set of tissue samples, then applied it to extract high-molecular-weight (HMW) DNA from over three thousand Kellet's whelk tissue samples. The protocol demonstrated scalability and, with added clean-up, suitability for RAD-seq, GT-seq, as well as whole genome sequencing using both long read (ONT MinION) and short read (Illumina NovaSeq) sequencing platforms. Our findings offer a robust and versatile DNA extraction and clean-up protocol for supporting genomic research on non-model marine organisms, to help mediate the under-representation of invertebrates in genomic studies.

Gastropoda , Animals , Gastropoda/genetics , Genome/genetics , Genomics , DNA/genetics , Sequence Analysis, DNA/methods

3.

De novo genome assembly and comparative genomics for the colonial ascidian Botrylloides violaceus.

Sumner, Jack T; Andrasz, Cassidy L; Johnson, Christine A; Wax, Sarah; Anderson, Paul; Keeling, Elena L; Davidson, Jean M.

G3 (Bethesda) ; 13(10)2023 09 30.

Article En | MEDLINE | ID: mdl-37555394

Ascidians have the potential to reveal fundamental biological insights related to coloniality, regeneration, immune function, and the evolution of these traits. This study implements a hybrid assembly technique to produce a genome assembly and annotation for the botryllid ascidian, Botrylloides violaceus. A hybrid genome assembly was produced using Illumina, Inc. short and Oxford Nanopore Technologies long-read sequencing technologies. The resulting assembly is comprised of 831 contigs, has a total length of 121 Mbp, N50 of 1 Mbp, and a BUSCO score of 96.1%. Genome annotation identified 13 K protein-coding genes. Comparative genomic analysis with other tunicates reveals patterns of conservation and divergence within orthologous gene families even among closely related species. Characterization of the Wnt gene family, encoding signaling ligands involved in development and regeneration, reveals conserved patterns of subfamily presence and gene copy number among botryllids. This supports the use of genomic data from nonmodel organisms in the investigation of biological phenomena.

Urochordata , Animals , Urochordata/genetics , Genomics/methods , Genome , Gene Dosage , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation

4.

The ENCODE Uniform Analysis Pipelines.

Hitz, Benjamin C; Lee, Jin-Wook; Jolanki, Otto; Kagda, Meenakshi S; Graham, Keenan; Sud, Paul; Gabdank, Idan; Strattan, J Seth; Sloan, Cricket A; Dreszer, Timothy; Rowe, Laurence D; Podduturi, Nikhil R; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Ho, Marcus; Miyasato, Stuart; Simison, Matt; Tanaka, Forrest; Luo, Yunhai; Whaling, Ian; Hong, Eurie L; Lee, Brian T; Sandstrom, Richard; Rynes, Eric; Nelson, Jemma; Nishida, Andrew; Ingersoll, Alyssa; Buckley, Michael; Frerker, Mark; Kim, Daniel S; Boley, Nathan; Trout, Diane; Dobin, Alex; Rahmanian, Sorena; Wyman, Dana; Balderrama-Gutierrez, Gabriela; Reese, Fairlie; Durand, Neva C; Dudchenko, Olga; Weisz, David; Rao, Suhas S P; Blackburn, Alyssa; Gkountaroulis, Dimos; Sadr, Mahdi; Olshansky, Moshe; Eliaz, Yossi; Nguyen, Dat; Bochkov, Ivan; Shamim, Muhammad Saad.

Res Sq ; 2023 Jul 19.

Article En | MEDLINE | ID: mdl-37503119

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

5.

The ENCODE Uniform Analysis Pipelines.

Hitz, Benjamin C; Jin-Wook, Lee; Jolanki, Otto; Kagda, Meenakshi S; Graham, Keenan; Sud, Paul; Gabdank, Idan; Strattan, J Seth; Sloan, Cricket A; Dreszer, Timothy; Rowe, Laurence D; Podduturi, Nikhil R; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Ho, Marcus; Miyasato, Stuart; Simison, Matt; Tanaka, Forrest; Luo, Yunhai; Whaling, Ian; Hong, Eurie L; Lee, Brian T; Sandstrom, Richard; Rynes, Eric; Nelson, Jemma; Nishida, Andrew; Ingersoll, Alyssa; Buckley, Michael; Frerker, Mark; Kim, Daniel S; Boley, Nathan; Trout, Diane; Dobin, Alex; Rahmanian, Sorena; Wyman, Dana; Balderrama-Gutierrez, Gabriela; Reese, Fairlie; Durand, Neva C; Dudchenko, Olga; Weisz, David; Rao, Suhas S P; Blackburn, Alyssa; Gkountaroulis, Dimos; Sadr, Mahdi; Olshansky, Moshe; Eliaz, Yossi; Nguyen, Dat; Bochkov, Ivan; Shamim, Muhammad Saad.

bioRxiv ; 2023 Apr 06.

Article En | MEDLINE | ID: mdl-37066421

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

6.

Reducing variability of breast cancer subtype predictors by grounding deep learning models in prior knowledge.

Anderson, Paul; Gadgil, Richa; Johnson, William A; Schwab, Ella; Davidson, Jean M.

Comput Biol Med ; 138: 104850, 2021 11.

Article En | MEDLINE | ID: mdl-34536702

Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge.

Breast Neoplasms , Deep Learning , Breast Neoplasms/genetics , Female , Humans , Machine Learning , Neural Networks, Computer

7.

Author Correction: An atlas of dynamic chromatin landscapes in mouse fetal development.

Gorkin, David U; Barozzi, Iros; Zhao, Yuan; Zhang, Yanxiao; Huang, Hui; Lee, Ah Young; Li, Bin; Chiou, Joshua; Wildberg, Andre; Ding, Bo; Zhang, Bo; Wang, Mengchi; Strattan, J Seth; Davidson, Jean M; Qiu, Yunjiang; Afzal, Veena; Akiyama, Jennifer A; Plajzer-Frick, Ingrid; Novak, Catherine S; Kato, Momoe; Garvin, Tyler H; Pham, Quan T; Harrington, Anne N; Mannion, Brandon J; Lee, Elizabeth A; Fukuda-Yuzawa, Yoko; He, Yupeng; Preissl, Sebastian; Chee, Sora; Han, Jee Yun; Williams, Brian A; Trout, Diane; Amrhein, Henry; Yang, Hongbo; Cherry, J Michael; Wang, Wei; Gaulton, Kyle; Ecker, Joseph R; Shen, Yin; Dickel, Diane E; Visel, Axel; Pennacchio, Len A; Ren, Bing.

Nature ; 589(7842): E4, 2021 Jan.

Article En | MEDLINE | ID: mdl-33398137

8.

Author Correction: An atlas of dynamic chromatin landscapes in mouse fetal development.

Gorkin, David U; Barozzi, Iros; Zhao, Yuan; Zhang, Yanxiao; Huang, Hui; Lee, Ah Young; Li, Bin; Chiou, Joshua; Wildberg, Andre; Ding, Bo; Zhang, Bo; Wang, Mengchi; Strattan, J Seth; Davidson, Jean M; Qiu, Yunjiang; Afzal, Veena; Akiyama, Jennifer A; Plajzer-Frick, Ingrid; Novak, Catherine S; Kato, Momoe; Garvin, Tyler H; Pham, Quan T; Harrington, Anne N; Mannion, Brandon J; Lee, Elizabeth A; Fukuda-Yuzawa, Yoko; He, Yupeng; Preissl, Sebastian; Chee, Sora; Han, Jee Yun; Williams, Brian A; Trout, Diane; Amrhein, Henry; Yang, Hongbo; Cherry, J Michael; Wang, Wei; Gaulton, Kyle; Ecker, Joseph R; Shen, Yin; Dickel, Diane E; Visel, Axel; Pennacchio, Len A; Ren, Bing.

Nature ; 586(7831): E31, 2020 Oct.

Article En | MEDLINE | ID: mdl-33037424

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

9.

An atlas of dynamic chromatin landscapes in mouse fetal development.

Gorkin, David U; Barozzi, Iros; Zhao, Yuan; Zhang, Yanxiao; Huang, Hui; Lee, Ah Young; Li, Bin; Chiou, Joshua; Wildberg, Andre; Ding, Bo; Zhang, Bo; Wang, Mengchi; Strattan, J Seth; Davidson, Jean M; Qiu, Yunjiang; Afzal, Veena; Akiyama, Jennifer A; Plajzer-Frick, Ingrid; Novak, Catherine S; Kato, Momoe; Garvin, Tyler H; Pham, Quan T; Harrington, Anne N; Mannion, Brandon J; Lee, Elizabeth A; Fukuda-Yuzawa, Yoko; He, Yupeng; Preissl, Sebastian; Chee, Sora; Han, Jee Yun; Williams, Brian A; Trout, Diane; Amrhein, Henry; Yang, Hongbo; Cherry, J Michael; Wang, Wei; Gaulton, Kyle; Ecker, Joseph R; Shen, Yin; Dickel, Diane E; Visel, Axel; Pennacchio, Len A; Ren, Bing.

Nature ; 583(7818): 744-751, 2020 07.

Article En | MEDLINE | ID: mdl-32728240

The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.

Chromatin/genetics , Chromatin/metabolism , Datasets as Topic , Fetal Development/genetics , Histones/metabolism , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/chemistry , Chromatin Immunoprecipitation Sequencing , Disease/genetics , Enhancer Elements, Genetic/genetics , Female , Gene Expression Regulation, Developmental/genetics , Genetic Variation , Histones/chemistry , Humans , Male , Mice , Mice, Inbred C57BL , Organ Specificity/genetics , Reproducibility of Results , Transposases/metabolism

10.

Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts.

Frésard, Laure; Smail, Craig; Ferraro, Nicole M; Teran, Nicole A; Li, Xin; Smith, Kevin S; Bonner, Devon; Kernohan, Kristin D; Marwaha, Shruti; Zappala, Zachary; Balliu, Brunilda; Davis, Joe R; Liu, Boxiang; Prybol, Cameron J; Kohler, Jennefer N; Zastrow, Diane B; Reuter, Chloe M; Fisk, Dianna G; Grove, Megan E; Davidson, Jean M; Hartley, Taila; Joshi, Ruchi; Strober, Benjamin J; Utiramerur, Sowmithri; Lind, Lars; Ingelsson, Erik; Battle, Alexis; Bejerano, Gill; Bernstein, Jonathan A; Ashley, Euan A; Boycott, Kym M; Merker, Jason D; Wheeler, Matthew T; Montgomery, Stephen B.

Nat Med ; 25(6): 911-919, 2019 06.

Article En | MEDLINE | ID: mdl-31160820

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.

Rare Diseases/genetics , Acid Ceramidase/genetics , Case-Control Studies , Child , Child, Preschool , Cohort Studies , Female , Genetic Variation , Humans , Male , Models, Genetic , Mutation , Oxidoreductases Acting on CH-CH Group Donors/genetics , Potassium Channels/genetics , RNA/blood , RNA/genetics , RNA Splicing/genetics , Rare Diseases/blood , Sequence Analysis, RNA , Exome Sequencing

11.

Prevention of data duplication for high throughput sequencing repositories.

Gabdank, Idan; Chan, Esther T; Davidson, Jean M; Hilton, Jason A; Davis, Carrie A; Baymuradov, Ulugbek K; Narayanan, Aditi; Onate, Kathrina C; Graham, Keenan; Miyasato, Stuart R; Dreszer, Timothy R; Strattan, J Seth; Jolanki, Otto; Tanaka, Forrest Y; Hitz, Benjamin C; Sloan, Cricket A; Cherry, J Michael.

Database (Oxford) ; 20182018 01 01.

Article En | MEDLINE | ID: mdl-29688363

Database URL: https://www.encodeproject.org/.

Data Curation/methods , Databases, Nucleic Acid/standards

12.

Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder.

Oláhová, Monika; Yoon, Wan Hee; Thompson, Kyle; Jangam, Sharayu; Fernandez, Liliana; Davidson, Jean M; Kyle, Jennifer E; Grove, Megan E; Fisk, Dianna G; Kohler, Jennefer N; Holmes, Matthew; Dries, Annika M; Huang, Yong; Zhao, Chunli; Contrepois, Kévin; Zappala, Zachary; Frésard, Laure; Waggott, Daryl; Zink, Erika M; Kim, Young-Mo; Heyman, Heino M; Stratton, Kelly G; Webb-Robertson, Bobbie-Jo M; Snyder, Michael; Merker, Jason D; Montgomery, Stephen B; Fisher, Paul G; Feichtinger, René G; Mayr, Johannes A; Hall, Julie; Barbosa, Ines A; Simpson, Michael A; Deshpande, Charu; Waters, Katrina M; Koeller, David M; Metz, Thomas O; Morris, Andrew A; Schelley, Susan; Cowan, Tina; Friederich, Marisa W; McFarland, Robert; Van Hove, Johan L K; Enns, Gregory M; Yamamoto, Shinya; Ashley, Euan A; Wangler, Michael F; Taylor, Robert W; Bellen, Hugo J; Bernstein, Jonathan A; Wheeler, Matthew T.

Am J Hum Genet ; 102(3): 494-504, 2018 03 01.

Article En | MEDLINE | ID: mdl-29478781

ATP synthase, H+ transporting, mitochondrial F1 complex, Î´ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynÎ´, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.

Alleles , Metabolic Diseases/genetics , Mitochondrial Proton-Translocating ATPases/genetics , Mutation/genetics , Protein Subunits/genetics , Amino Acid Sequence , Base Sequence , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Loss of Function Mutation/genetics , Male , Mitochondria/metabolism , Mitochondria/ultrastructure , Mitochondrial Proton-Translocating ATPases/chemistry , Protein Subunits/chemistry

13.

The Encyclopedia of DNA elements (ENCODE): data portal update.

Davis, Carrie A; Hitz, Benjamin C; Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Hilton, Jason A; Jain, Kriti; Baymuradov, Ulugbek K; Narayanan, Aditi K; Onate, Kathrina C; Graham, Keenan; Miyasato, Stuart R; Dreszer, Timothy R; Strattan, J Seth; Jolanki, Otto; Tanaka, Forrest Y; Cherry, J Michael.

Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.

Article En | MEDLINE | ID: mdl-29126249

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.

DNA/genetics , Databases, Genetic , Gene Components , Genomics , High-Throughput Nucleotide Sequencing , Metadata , Animals , Caenorhabditis elegans/genetics , Data Display , Datasets as Topic , Drosophila melanogaster/genetics , Forecasting , Genome, Human , Humans , Mice/genetics , User-Computer Interface

14.

SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael.

PLoS One ; 12(4): e0175310, 2017.

Article En | MEDLINE | ID: mdl-28403240

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.

Databases, Genetic , Genomics/methods , Metadata , Software , Animals , DNA/genetics , Genome , Humans , Mice

15.

Principles of metadata organization at the ENCODE data coordination center.

Hong, Eurie L; Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Malladi, Venkat S; Strattan, J Seth; Hitz, Benjamin C; Gabdank, Idan; Narayanan, Aditi K; Ho, Marcus; Lee, Brian T; Rowe, Laurence D; Dreszer, Timothy R; Roe, Greg R; Podduturi, Nikhil R; Tanaka, Forrest; Hilton, Jason A; Cherry, J Michael.

Database (Oxford) ; 20162016.

Article En | MEDLINE | ID: mdl-26980513

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.

Computational Biology/methods , DNA/genetics , Databases, Genetic , Algorithms , Animals , Caenorhabditis elegans , Computational Biology/standards , Data Collection , Drosophila melanogaster , High-Throughput Nucleotide Sequencing , Humans , Mice , Nucleic Acids/genetics , Quality Control , Reproducibility of Results , Sequence Alignment

16.

Resources for the Comprehensive Discovery of Functional RNA Elements.

Sundararaman, Balaji; Zhan, Lijun; Blue, Steven M; Stanton, Rebecca; Elkins, Keri; Olson, Sara; Wei, Xintao; Van Nostrand, Eric L; Pratt, Gabriel A; Huelga, Stephanie C; Smalec, Brendan M; Wang, Xiaofeng; Hong, Eurie L; Davidson, Jean M; Lécuyer, Eric; Graveley, Brenton R; Yeo, Gene W.

Mol Cell ; 61(6): 903-13, 2016 Mar 17.

Article En | MEDLINE | ID: mdl-26990993

Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell lines, tissues, and developmental stages is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology, and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).

Databases, Genetic , RNA-Binding Proteins/genetics , RNA/metabolism , Transcriptome/genetics , Binding Sites , Humans , Protein Binding , RNA/genetics , RNA, Small Interfering/classification , RNA, Small Interfering/genetics , RNA-Binding Proteins/metabolism

17.

ENCODE data at the ENCODE portal.

Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Malladi, Venkat S; Strattan, J Seth; Hitz, Benjamin C; Gabdank, Idan; Narayanan, Aditi K; Ho, Marcus; Lee, Brian T; Rowe, Laurence D; Dreszer, Timothy R; Roe, Greg; Podduturi, Nikhil R; Tanaka, Forrest; Hong, Eurie L; Cherry, J Michael.

Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.

Article En | MEDLINE | ID: mdl-26527727

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.

Databases, Genetic , Genome, Human , Genomics , Animals , DNA/metabolism , Genes , Humans , Mice , Proteins/metabolism , RNA/metabolism

18.

Integrative Genomics Implicates EGFR as a Downstream Mediator in NKX2-1 Amplified Non-Small Cell Lung Cancer.

Clarke, Nicole; Biscocho, Jewison; Kwei, Kevin A; Davidson, Jean M; Sridhar, Sushmita; Gong, Xue; Pollack, Jonathan R.

PLoS One ; 10(11): e0142061, 2015.

Article En | MEDLINE | ID: mdl-26556242

NKX2-1, encoding a homeobox transcription factor, is amplified in approximately 15% of non-small cell lung cancers (NSCLC), where it is thought to drive cancer cell proliferation and survival. However, its mechanism of action remains largely unknown. To identify relevant downstream transcriptional targets, here we carried out a combined NKX2-1 transcriptome (NKX2-1 knockdown followed by RNAseq) and cistrome (NKX2-1 binding sites by ChIPseq) analysis in four NKX2-1-amplified human NSCLC cell lines. While NKX2-1 regulated genes differed among the four cell lines assayed, cell proliferation emerged as a common theme. Moreover, in 3 of the 4 cell lines, epidermal growth factor receptor (EGFR) was among the top NKX2-1 upregulated targets, which we confirmed at the protein level by western blot. Interestingly, EGFR knockdown led to upregulation of NKX2-1, suggesting a negative feedback loop. Consistent with this finding, combined knockdown of NKX2-1 and EGFR in NCI-H1819 lung cancer cells reduced cell proliferation (as well as MAP-kinase and PI3-kinase signaling) more than knockdown of either alone. Likewise, NKX2-1 knockdown enhanced the growth-inhibitory effect of the EGFR-inhibitor erlotinib. Taken together, our findings implicate EGFR as a downstream effector of NKX2-1 in NKX2-1 amplified NSCLC, with possible clinical implications, and provide a rich dataset for investigating additional mediators of NKX2-1 driven oncogenesis.

Carcinoma, Non-Small-Cell Lung/genetics , ErbB Receptors/genetics , Gene Expression Regulation, Neoplastic , Lung Neoplasms/genetics , Nuclear Proteins/genetics , Transcription Factors/genetics , Antineoplastic Agents/pharmacology , Carcinoma, Non-Small-Cell Lung/pathology , Cell Line, Tumor , Cell Proliferation/drug effects , Cell Proliferation/genetics , Erlotinib Hydrochloride/pharmacology , Humans , Lung Neoplasms/pathology , Protein Kinase Inhibitors/pharmacology , Signal Transduction/drug effects , Signal Transduction/genetics , Thyroid Nuclear Factor 1 , Up-Regulation/drug effects , Up-Regulation/genetics

19.

Ontology application and use at the ENCODE DCC.

Malladi, Venkat S; Erickson, Drew T; Podduturi, Nikhil R; Rowe, Laurence D; Chan, Esther T; Davidson, Jean M; Hitz, Benjamin C; Ho, Marcus; Lee, Brian T; Miyasato, Stuart; Roe, Gregory R; Simison, Matt; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest; Kent, W James; Cherry, J Michael; Hong, Eurie L.

Database (Oxford) ; 20152015.

Article En | MEDLINE | ID: mdl-25776021

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.

Data Curation/methods , Databases, Genetic , Gene Ontology , Gene Regulatory Networks/physiology , Molecular Sequence Annotation/methods , Transcription, Genetic/physiology , Animals , Humans , Mice

20.

Doctors' age at domestic partnership and parenthood: cohort studies.

Goldacre, Michael J; Davidson, Jean M; Lambert, Trevor W.

J R Soc Med ; 105(9): 390-9, 2012 Sep.

Article En | MEDLINE | ID: mdl-22977049

OBJECTIVE: To report on doctors' family formation. Design Cohort studies using structured questionnaires. Setting UK. Participants Doctors who qualified in 1988, 1993, 1996, 1999, 2000 and 2002 were followed up. MAIN OUTCOME MEASURES: Living with spouse or partner; and doctors' age when first child was born. RESULTS: The response to surveys including questions about domestic circumstances was 89.8% (20,717/23,077 doctors). The main outcomes - living with spouse or partner, and parenthood - varied according to age at qualification. Using the modal ages of 23-24 years at qualification, by the age of 24-25 (i.e. in their first year of medical work) a much smaller percentage of doctors than the general population was living with spouse or partner. By the age of 33, 75% of both women and men doctors were living with spouse or partner, compared with 68% of women and 61% of men aged 33 in the general population. By the age of 24-25, 2% of women doctors and 41% of women in the general population had a child; but women doctors caught up with the general population, in this respect, in their 30s. The specialty with the highest percentage of women doctors who, aged 35, had children was general practice (74%); the lowest was surgery (41%). CONCLUSIONS: Doctors are more likely than other people to live with a spouse or partner, and to have children, albeit typically at later ages. Differences between specialties in rates of motherhood may indicate sacrifice by some women of family in favour of career.

Family , Parenting , Physicians/statistics & numerical data , Spouses/statistics & numerical data , Adult , Age Factors , Cohort Studies , Female , Humans , Male , Medicine/statistics & numerical data , Surveys and Questionnaires , United Kingdom , Young Adult